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Abstract. We study the expressive power of positive neural networks. The 
model uses positive connection weights and multiple input neurons. Different 
behaviors can be expressed by varying the connection weights. We show that 
in discrete time, and in absence of noise, the class of positive neural networks 
captures the so-called monotone-regular behaviors, that are based on regular lan¬ 
guages. A hner picture emerges if one takes into account the delay by which a 
monotone-regular behavior is implemented. Each monotone-regular behavior can 
be implemented by a positive neural network with a delay of one time unit. Some 
monotone-regular behaviors can be implemented with zero delay. And, interest¬ 
ingly, some simple monotone-regular behaviors can not be implemented with zero 
delay. 


1 Introduction 


Positive neural networks Based on experimental observations, Douglas and 


Martin (2004) have proposed an abstract model of the neocortex consisting of 
interconnected winner-take-all circuits. Each winner-take-all circuit consists of 
excitatory neurons that, besides exciting each other, indirectly inhibit each other 
through some inhibition layer. This causes only a few neurons in the circuit to be 


active at any time. Kappel et ah (2014) further demonstrate through theoretical 


analysis and simulations that the model of interconnected winner-take-all circuits 
might indeed provide a deeper understanding of some experimental observations. 
In this article, we take two inspirational points from this model, that we discuss 
below. 

First, although a biological neural network in general contains both excitatory 
and inhibitory connections between neurons (Gerstner et ah, 2014), excitation 


and inhibition are not combined in an arbitrary fashion in the above model of 
interconnected winner-take-all circuits. In that model, the meaning seems to be 
mostly contained in the excitatory connections whereas inhibitory connections 
play a more regulatory role such as controlling how many neurons can become 
simultaneously active. Based on this apparent value of excitatory connections, in 
this article we are inspired to study neural networks that are simplihed to contain 
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only excitatory connections between neurons. Technically, we consider so-called 
positive neural networks, where all connections are given a weight that is either 
strictly positive or zero. 

Second, it appears useful to study neural network models with multiple input 
neurons. In the model of interconnected winner-take-all circuits, each circuit has 
multiple input neurons that can be concurrently active. This allows each circuit 
to receive rich input symbols. The input neurons of a circuit could receive stimuli 
directly from sensory organs, or from other circuits. It would be fascinating to 
understand how neurons build concepts or recognize patterns over such rich inputs. 

Based on the above inspiration, in this article we study a simple positive neu¬ 
ral network model with multiple input neurons, operating in discrete time. As 
mentioned above, the use of nonnegative weights allows only excitation between 
neurons and no inhibition. In our model, each positive neural network has dis¬ 
tinguished sets of input neurons, output neurons, and auxiliary neurons. The 
network may be recurrent, i.e., the activation of a neuron may indirectly influence 


its own future activation. As in some previous models (Sima and Wiedermann 


1998 


Sima and Orponen, 2003), we omit noise and learning. We believe that the 


omission of inhibition (i.e., negative connection weights) might allow for a bet¬ 
ter understanding of the foundations of computation in neural networks, where 
different features gradually increase the expressive power (see also Section]^. Ex¬ 
citation between neurons seems to be a basic feature that we can not omit. The 
omission of inhibition leads to a notion of monotonicity that we will discuss later 
in the Introduction. 

As a hnal point for the motivation of the model, we mention that biological 
neurons seem to mostly encode information in the timing of their activations 


and not in the magnitude of the activation signals (Gerstner et ah, 2014). In this 


perspective, one may view discrete time models like ours as highlighting the causal 
steps of the neuronal computation. The discrete time step could in principle be 
chosen very small. 


Expressivity study Our aim in this article is to better understand what com¬ 
putations can or cannot be performed by positive neural networks. We show that 
positive neural networks represent the class of so-called monotone-regular behav¬ 
iors. The relevance of this result is discussed later in the Introduction. We hrst 
provide the necessary context. 

Many previous works have investigated the expr essive power of various kinds 
of neural network models (Sima and Orponen, 2003). A common idea is to relate 


neural networks to theoretical computation devices like automata (Hopcroft and 


Ullman, 1979 Sipser, 2006). A lower bound on the expressiveness of a neural net¬ 


work model can be established by simulating automata with neural networks in 
that model. Conversely, an upper bound on the expressiveness can be established 
by simulating neural networks with automata. In previous works, simulations 
with neural networks of both deterministic hnite automata (Alon et al.| 1991| 


Indyk, 1995 Omlin and Giles, 1996 Horne and Hush, 1996) and nondetermin- 
istic hnite automata (Carrasco et al., 1999) have been studied. Some models of 


neural networks even allow the simulation of arbitrary Turing machines, that are 
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much more powerful than finite antomata, see e.g. (Siegelmann and Sontag, 1995 


Maass, 1996). However, the technical constructions used for the simulation of snch 


powerfnl machines are not necessarily biologically plausible. 

In this article, our approach in the expressivity study is to describe the be¬ 
haviors exhibited by positive neural networks, as follows. An inpnt symbol in onr 
model is a snbset of inpnt nenrons that are concnrrently active. For example, if 
the symbol {a, 5, c} is presented as inpnt to the network at time t then this means 
that a, b, and c are the only inpnt nenrons that are (concnrrently) active at time t. 
The empty symbol would mean that no inpnt nenrons are active. Ontput symbols 
are defined similarly, bnt over ontput neurons instead. Now, we define behaviors 
as fnnctions that transform each sequence of input symbols to a sequence of ontput 
symbols. Different behaviors can be expressed by varying the connection weights 
of a positive neural network. By describing such behaviors, we can derive theoret¬ 
ical npper and lower bounds on the expressivity of positive neural networks. We 
emphasize that we feed seqnences of inpnt symbols to the nenral networks, not 
single symbols. 

Our assumption of mnltiple input neurons that may become concnrrently active 
is in contrast to models of past expressivity stndies, where either inpnt encodings 
were used that (i) made only one inpnt nenron active at any given time (Sima and 


Wiedermann, 1998 Carrasco et ah, 1999,2000) or (ii) presented a single bit string 


just once over mnltiple inpnt nenrons after which they remained silent (Sima and 


Orponen, 2003). Essentially, mnltiple parallel inpnts versns a single sequential 
inpnt is a matter of input alphabet. One might propose that an external process 
conld transform mnltiple parallel inpnts to a single sequen tial one (say, a stream of 


bits), after which previous results might be applied, e.g. (Sima and Wiedermann 


1998). However, in a biologically plausible setting there is no snch external process: 


in general, it seems that inpnts arrive from mnltiple sensory organs in parallel, and 
an internal circnit receives inpnts from mnltiple other internal circnits in parallel 
as remarked at the beginning of the Introdnction. Becanse onr aim is to better 
nnderstand the biologically plansible setting, we therefore have to work with an 
input alphabet where multiple (parallel) input neurons may become concnrrently 
active. 


Monotone-regular behaviors To describe the behaviors exhibited by positive 
neural networks, we nse the class of regular languages, which are those languages 
recognized by finite automata (Hopcroft and Ullman, 1979; Sipser, 2006). Previ¬ 


ously, Sima and Wiedermann (1998) have shown that neural networks in discrete 


time that read bit strings over a single inpnt nenron recognize whether prehxes of 
the inpnt string belong to a regnlar language or not. In their technical constrnc- 
tion, Sima and Wiedermann essentially simulate nondeterministic finite antomata. 

In this article, we simulate nondeterministic finite antomata in the setting 
of positive nenral networks]^ Using the simnlation, we show that the class of 
positive neural networks captures the so-called monotone-regular behaviors. A 


^For example, the finite automaton in Figure la is simulated by the positive neural network 


in Figure lb 
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monotone-regular behavior describes the activations of each output neuron with 
a regular language over input symbols, where each symbol may contain multiple 
input neurons as described above. Monotonicity means that each output neuron 
is activated whenever strings of the regular language are embedded in the input, 
regardless of any other activations of input neurons. Phrased differently, enriching 
an input with more activations of input neurons will never lead to fewer activations 
of output neurons. Monotonicity arises because neurons only excite each other and 
do not inhibit each other. This notion did not appear explicitly in the work by 


Sima and Wiedermann (1998) because their neural networks exactly recognize 


regular languages over the single input neuron by using inhibition (i.e., negative 
connection weights): inhibition allows to explicitly test for the absence of input 

activations at certain times. _ 

Delay is a standard notion in the study of neural networks (Sima and Orponen| 


2003). Intuitively, delay is the number of extra time steps needed by the neural 


network before it can produce the output symbols prescribed by the behavior. 
We show that each monotone-regular behavior can be implemented by a positive 
ne ural network with a delay of o ne time unit. This result is in line with the result 
by Sima and Wiedermann (1998), but it is based on a new technical construction to 
deal with the more complex input symbols generated by concurrently active input 
neurons. We simulate automaton states by neurons as expected, but we design 
the weights of the incoming connections to a neuron to express simultaneously 
(i) a.\i “or” over context neurons that provide working memory and (ii) an “and' 


over all input neurons mentioned in an input symbol. As in the work by Sima 


and Wiedermann (1998), the constructed neural network may activate auxiliary 


neurons in parallel. Accordingly, our simulation preserves the nondeterminism, or 
parallelism, of the simulated automaton. As an additional result, we show that 
a large class of monotone-regular behaviors can be implemented with zero delay. 
And, interestingly, some simple monotone-regular behaviors can provably not be 
implemented with zero delay. 

To the best of our knowledge, the notion of monotone-regular behaviors is 
introduced in this article for the hrst time. But this notion is a natural com¬ 
bination of some previously existing concepts, results, and general intuition, as 
follows. First, it is likely that both the temporal structure and spatial structure 


of sensory inputs are important for biological organisms (Buonomano and Maass 


2009). The temporal structure describes the timing of sensory events, and the 


spatial structure describes which and how many neurons are used to represent 
each sensory event. Second, the well-known class of regular languages from for¬ 
mal language theory describes symbol sequences that exhibit certain patterns or 


regularities (Hopcroft and Ullman, 1979); temporal structure is represented by 


the ordering of symbols, and spatial structure is given by the individual symbols. 
The relationship between regular languages and neural network models has also 
been investigated before (Sima and Wiedermann, 1998). Third, without inhibi¬ 
tion, neurons only excite each other and therefore an increased activity of input 
neurons will not lead to a decreased activity of output neurons. Without inhibi¬ 
tion, neurons will respond to patterns embedded in the input stream regardless 
of any other simultaneous patterns, giving rise to a form of monotonicity on the 


4 
























resulting behavior. 


Relevance We conclude the Introduction by placing our result in a larger pic¬ 
ture. The intuition explored in this article, is that neural networks in some sense 
represent grammars. A grammar is any set of rules describing how to form se¬ 
quences of symbols over a given alphabet; such sequences may be called sentences. 

In an experiment by Reber (1967), subjects were shown sentences generated by 
an artificial grammar, but the rules of the grammar were not shown. Subjects were 
better at memorizing and reproducing sentences generated by the grammar when 
compared to sentences that are just randomly generated. Moreover, subjects were 
generally able to classify sentences as being grammatical or not. Interestingly, 
however, snbjects could not verbalize the underlying rules of the grammar. This 
experiment suggests that organisms learn patterns from the environment when the 
patterns are sufficiently repeated. Those patterns get embedded into the neural 
network. The resulting grammar can not necessarily be described or explicitly 
accessed by the organism. 

The grammar hypothesis is to some extent conhrmed by neuronal recordings 
of brain areas involved with movement planning in monkeys (Shima and Tanji| 


2000; Isoda and Tanji, 2003). These experimental findings suggest that movement 


sequences are represented by two groups of neurons: the first group represents the 
temporal structure, and the second group represents individual actions. Neurons 
in the first group might be viewed as stringing together the output symbols repre¬ 
sented by the second group. Hence, the first group might represent the structure 
of a grammar, indicating the allowed sentences of output symbols. 


The above experiments are complemented by Kappel et ah (2014), who have 


theoretically shown and demonstrated with computer simulations that neural 
winner-take-all circuits can (learn to) express hidden Markov models. Hidden 
Markov models are finite state machines with transition probabilities between 
states, and each state has a certain probability to emit symbols. Such models 
describe grammars, because each visited state can contribute symbols to an in¬ 
creasing sentence. One of the insights by Kappel et al. is that by repeatedly 
showing sentences generated by a hidden Markov model to a learning winner- 
take-all circuit, the states of the Markov model are eventually encoded by global 
network states, i.e., groups of activated neurons. This way, the neural network 
builds an internal model of how sentences are formed by the hidden grammar. In¬ 
terestingly, the computer simulations by Kappel et al. (2014) clearly demonstrate 
(and visualize) that neurons learn to cooperate in a chain-like fashion, expressing 
the symbol chains in the hidden grammar. This corresponds well to the earlier 
predictions (Reber, 1967 Shima and Tanji, 2000; Isoda and Tanji, 2003). We 


might speculate that, if one assumes a real-world environment to be a (complex) 
hidden Markov model, organisms with a neural network can learn to understand 
the patterns, or sentences, generated by that environment. 

In this article, we have made the above grammar intuition formal for positive 
neural networks. By characterizing the expressive power of positive neural net¬ 
works with monotone-regular behaviors, the activation of an output neuron may 
be viewed as the recognition of a pattern in the input. This way, each output neu- 
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ron represents a grammar: the output neuron recognizes which input sentences 
satisfy the grammar. Moreover, our hnding that nondeterministic finite automata 
can be simulated by positive neural networks is in line with the expressivity result 
of Kappel et ah (2014) because hidden Markov models generalize nondeterministic 
automata (Dupont et ah, 2005): in a standard nondeterministic automaton, all 
successor states of a given state are equally likely, whereas a hidden Markov model 
could assign different transition proba bilities to each successor s tate. The simu¬ 
lation of automata by previous works (Sima and Orponen, 2003) and the current 


article, combined with the result by Kappel et al. (2014), might provide a use¬ 
ful intuition: individual neurons or groups of neurons could represent automaton 
states of a grammar. 


Outline This article is organized as follows. We discuss related work in Sec¬ 
tion 1^ We provide in Section the necessary preliminaries, including the for¬ 
malization of positive neural networks and monotone-regular behaviors. Next, 
we provide in Section our results regarding the expressivity of positive neural 
networks. We conclude in Section with topics for future work. 


2 Related Work 


We now discuss several theoretical works that are related to this article. 

The relationship between the semantic notion of monotonicity and the syntac¬ 
tic notion of positive weights is natural, and has been explored in other settings 


2008 

i 

Daniels and Velikova 

Maass 

(2008 

) studies more 


2010). In particular, the paper by Legenstein and 


neurons. In their setting, hxing some natural number n, there is one output neu¬ 
ron that is given points from as presynaptic input. Each choice of weights from 
M” allows the output neuron to express a binary (true-false) classihcation of input 
points, where “true” is represented by the activation of the output neuron. By 
imposing sign-constraints on the weights, different families of output neurons are 
created. For example, one could demand that only positive weights are used. It 
turns out that the VC-dimension of sign-constrained neurons with n presynaptic 
inputs is n, which is only one less than unconstrained neurons]^ Moreover, Legen- 


stein and Maass (2008) characterize the input sets (containing n points from 


for which sign-constrained neurons can express all binary classification functions. 

Like in the Introduction, we define an input symbol as a set of concurrently 
active input neurons. The results by Legenstein and Maass (2008) can be used 
to better understand the nature of input symbols that are distinguishable from 
each other by a single output neuron having nonnegative presynaptic weights, 
also referred to as a positive neuron. Indeed, if we would receive a stream of input 


^For example, for the case of positive weights, the VC-dimension n tells us that there is an 
input set S C M" with |S'| = n that can be shattered by the family of positive presynaptic 
weights, in the following sense: for each classification h : S —>■ {1,0}, there exists a positive 
presynaptic weight vector in K" allowing the resulting single output neuron to express h on S. 
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symbols and if we would like to individually classify each input symbol by the 
activation behavior of a positive output neuron (where activation means “true”), 
the results by Legenstein and Maass (2008) provide sufficient and necessary con¬ 
ditions on the presented input symbols to allow the output neuron to implement 
the classihcation. It is also possible, however, to consider a temporal context for 
each input symbol; then, the decision to activate an output neuron for a certain 
input symbol depends on which input symbols were shown previously. For exam¬ 
ple, considering an alphabet of input symbols A, B, and C, we might want to 
activate the output neuron on symbol B while witnessing the string {A, A, B) but 
not while only witnessing the string (C, C, i?). Hence, the output activation for 
symbol B depends on the temporal context in which B appears. For a maximum 
string length k, and assuming that input symbols have maximum size n, one could 
in principle present a string of k symbols to the output neuron in a single glance, 
using n times k (new) input neurons. In that approach, the output neuron could 
even recognize strings of a length I < k that are embedded into the presented 
string, giving rise to the notion of monotonicity discussed in this article. For such 
cases, the results by Legenstein and Maass (2008) could still be applied to better 
understand the nature of strings that can be recognized by a positive output neu¬ 
ron. As mentioned in the Introduction, in this article we use regular languages 
to describe the strings of input symbols upon which an output neuron should 
become activated. In contrast to hxing a maximum length k on strings, regular 
languages can describe arbitrarily long strings, by allowing arbitrary repetitions 
of substrings. By applying our expressivity lower bound (Theorem 4.5), we can 
for example construct a neural network that activates an output neuron whenever 
a string of the form A*B is embedded at the end of the so-far witnessed stream 
of input symbols, where A* denotes that symbol A may be repeated an arbitrary 
number of times. Moreover, the neural network can be constructed in such a way 
that the output neuron responds with a delay of at most one time unit compared 
to the pattern’s appearance. Results regarding regular languages, in combination 
with delay, can be analyzed in the framework of the current article. Instead of 
single output neurons, we consider larger networks where auxiliary neurons can 
assist the output neurons by reasoning over the temporal context of the input 
symbols. 

Positive neural networks are also related to the monotone acyclic AND-OR 
boolean circuits, studied e.g. by Alon and Boppana (1987). Concretely, an AND- 
OR circuit is a directed acyclic graph whose vertices are gates that compute either 
an OR or an AND of the boolean signals generated by the predecessor gates. The 
input to the circuit consists of a hxed number of boolean variables. Each AND-OR 
circuit is a special case of a positive neural network: each AND and OR gate can 
be translated to a neuron performing the same computation, by applying positive 
edge weights to presynaptic neurons. 

The neurons studied in the current article compute a Boolean linear threshold 
function of their presynaptic inputs: each neuron computes a weighted sum of 
the (Boolean) activations of its presynaptic neurons and becomes activated when 
that sum passes a threshold. Now, the acyclic AND-OR-NOT circuits discussed 
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by Parberry (1994) are related to the AND-OR circuits mentioned abovej^ It 
turns out that every Boolean linear threshold function over n input variables can 
be computed by an acyclic AND-OR-NOT circuit with a number of gates that 
is polynomial in n and with a depth that is logarithmic in n|^ One may call 
such circuits “small”, although not of constant size in n. The essential idea in 
this transformation, is that AND-OR-NOT circuits can compute the sum of the 
weights for which the corresponding presynaptic input is true, and subsequently 
compare that sum to a threshold; a binary encoding of the weights and threshold 
can be embedded into the circuit, but care is taken to ensure that this encoding is 
of polynomial size in u. It appears, however, that NOT gates play a crucial role 
in the construction, for handling the carry bit in the summation. The resulting 


circuit is therefore not positive (or monotone) in the sense of Alon and Boppana 


(1987). For completeness, we remark that delay is increased if one would replace 


each Boolean linear threshold neuron with a corresponding AND-OR-NOT sub¬ 
circuit, at least if one time unit is consumed for calculating each gate of each 
sub-circuit. Given that the transformation produces sub-circuits of non-constant 
depth, it appears nontrivial to describe the overall delay exhibited by the network. 


Horne and Hush (1996) show upper and lower bounds on the number of re¬ 


quired neurons for simulating deterministic hnite automata that read and write 
sequences of bits. Their approach is to encode the state transition function of an 
automaton as a Boolean function, that is subsequently implemented by an acyclic 
neural network]^ Each execution of the entire acyclic neural network corresponds 
to one update step of the simulated automaton. A possible advantage of the 
method by [Horne and Hilsh (1996), is that the required number of neurons could 
be smaller than the number of automaton states. But, like in the discussion of 
AND-OR-NOT circuits above, the construction introduces a nontrivial delay in 
the simulation of the automaton if each neuron (or each layer of neurons) is viewed 
as consuming one time unit. In this article we are not necessarily concerned with 
compacting automaton states in as few neurons as possible, but we are instead 
interested in recognizing a regular language under a maximum delay constraint 
(of one time unit) in the setting where multiple input neurons can be concur¬ 
rently active and produce a stream of complex input symbols. The construction 


by Horne and Hush (1996) can be modihed to multiple input neurons that may 


become concurrently active. 

For completeness, we remark that in this article we do not impose the restric¬ 
tion that the (simulated) automata are deterministic. Moreover, in our simulation 
of automata, we take care to only introduce a polynomial increase in the number of 


neurons compared to the original number of automaton states (see Theorem 4.5). 
In particular, if the original automaton is nondeterministic, the constructed neu¬ 
ral network for this automaton will preserve that nondeterminism in the form of 


Parberry (1994) actually refers to AND-OR-NOT circuits as AND-OR circuits because NOT 


gates can be pushed to the first layer, which can be used to establish a normal form where layers 
of AND gates alternate with layers of OR gates (with negation only at the first level). 


^In particular, we are referring to Theorem 7.4.7 (and subsequently Corollary 6.1.6) of Par¬ 
berry (|1994). 


m there are m automaton states then each state can be represented by [log 2 m] bits. 




























concurrently active neurons. This stems from our original motivation to propose 
a construction that could in principle be biologically plausible, where multiple 
neurons could be active in parallel to explore the different states of the original 
automaton. From this perspective, implementing a deterministic solution, where 
only one neuron is active at any given moment, would be less interesting. 

Monotonicity in the context of automata has appeared earlier in the work by 


Gecseg and Imreh (2001). There, an automaton is called monotone if there exists 
a partial order < on the automaton states, such that each transition {a,x,a'), 
going from state a to state a' through symbol x, satishes a < a'. Intuitively, this 
condition prohibits cycles between two different states while parsing a string. In 
particular, the same state a may not be reused except when the previous state 
was already a (i.e., self-looping on a is allowed for a while). A language is called 
monotone when there is a monotone automaton that recognizes it. This notion 
of monotonicity is not immediately related to the current article, because our 
notion of monotonicity is not dehned on automata (nor on neural networks) but 
on behaviors, which formalize semantics separate from the actual computation 
mechanism. Moreover, the positive neural networks studied in this article may 
reuse the same global state while processing an input string, where a global state 
is dehned as the set of currently activated neurons. For example, the empty global 
state could occur multiple times while processing an input string, even when this 
empty global state has precursor states and successor states that are not empty. 


3 Preliminaries 

3.1 Finite Automata and Regular Languages 


We recall the dehnitions of hnite automata and regular languages (Sipser, 2006). 
An alphabet S is a hnite set. A string a over S is a hnite sequence of elements 
from S. The empty string corresponds to the empty sequence. We also refer to 
the elements of a string as its symbols. A language C over S is a set of strings 
over S. Languages can be hnite or inhnite. 

The length of a string a is denoted |a|. For each i G {!,...,|a|}, we write 
OLi to denote the symbol of a at position i. We use the following string notation: 
a = (q?!, ..., a\a\). For each i G {1,..., |a|}, let a^i denote the prehx (ai,..., a,). 

A (finite) automaton is a tuple M = (Q, S, 5, g®, F) where 

• Q is a hnite set of states; 

• S is an alphabet; 

• 5 is the transition function, mapping each pair (g. S') G Q x E to a subset 

ofQfl 

• g® is the start state, with g® G Q; and, 

• F C Q is the set of accepting states. 


^Importantly, this subset could be empty. 
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Let a = (ai,..., a„) be a string over S. We call a sequence of states gi,..., qn+i 
of M a run of M on a if the following conditions are satisfied: 


• <?i = 0 ^] and, 


• Qi E ai-i) for each i E {2,..., n + 1}. 

We say that the run qi, is accepting if qn+i E F. We say that the automa¬ 
ton M accepts a if there is an accepting run of M on Automaton M could be 
nondeterministic: for the same input string a, there could be multiple accepting 
runs. See also Remark 13.II below. 

We dehne the language £ over S that is recognized by M: language C is the 
set of all strings over S that are accepted by M. Now, a language is said to be 
regular if it is recognized by an automaton. 


Remark 3.1. We call an automaton M = (Q, S, 5, g®, F) deterministic if \d{q, S')! = 
1 for each (g. S') G Q x S, i.e., the successor state is uniquely dehned for each 
combination of a predecessor state and an input symbol. Nondeterministic au¬ 
tomata are typically smaller and easier to understand compared to deterministic 
automata (Sipser, 2006). Moreover, if M is nondeterministic then it represents 


parallel computation. To see this, we can dehne an alternative but equivalent se¬ 
mantics for M as follows (Sipser, 2006). The parallel run of M on an input string 
a = («!,..., an) over S is the sequence 


P\i . . . , Pn+li 

where Pi = {g""} and Pi = {qi E Q \ 3gj_i G Pi-i with g* G 5(gi_i, cci-i)} for 
each i E {2,..., n -|- 1}. We say that M accepts a under the parallel semantics if 
the last state set of the parallel run contains an accepting state. It can be shown 
that the parallel semantics is equivalent to the semantics of acceptance given ear¬ 
lier. Because non-deterministic automata explore multiple states simultaneously 
at runtime, they appear to be a natural model for understanding parallel compu¬ 
tation in neural networks (see Section 4.2). □ 


3.2 Behaviors 


We use behaviors to describe computations separate from neural networks. Re¬ 
garding notation, for a set A, let V{X) denote the powerset of X, i.e., the set of 
all subsets of X. 

Let X and O be hnite sets, whose elements we may think of as representing 
neurons. In particular, the elements of X and O are called input and output 
neurons respectively. Now, a behavior B over input set X and output set (9 is a 
function that maps each nonempty string over alphabet V{F) to a subset of O. 
Regarding terminology, for a string a over V{X) and an index i E {!,...,|a|}. 


^Our definition of automata omits the special symbol e, that can be used to visit multi¬ 
ple states in sequence without simultaneously reading symbols from the input string. This 
feature can indeed always be removed from an automaton, without increasing the number of 
states (Hopcroft and Ullman [1979 ). 


10 














the symbol a, says which input neurons are active at (discrete) time i. Note that 
multiple input neurons can be concurrently active. 

For an input string a = (ai,..., over V{Z), the behavior B implicitly 
dehnes the following output string /? = (/3i,..., (3n+i) over V{0)\ 

• /9i = 0 , and 

• Pi = B{a^i-i) for each i E { 2 ,..., n + 1 }. 

So, the behavior has access to the preceding input history when producing each 
output symbol. But an output symbol is never based on future input symbols. 


3.3 Monotone-regular Behaviors 

Let X be a set of input neurons. We call a language C over alphabet V(T) founded 
when each string of C is nonempty and has a nonempty subset of X for its first 
symbol. Also, for two strings a and /3 over V{X), we say that a embeds P ii a 
has a suffix 7 with I 7 I = \P\ such that Pi C 7 ^ for each i G {1,..., \P\}- Note that 
P occurs at the end of a. Also note that a string embeds itself according to this 
definition. 

Let S be a behavior over an input set X and an output set O. We call B 
monotone-regular if for each output neuron x E O there is a founded regular 
language C{x) such that for each nonempty input string a over X(X), 

X E B{a) 77 a embeds a string P E C{x). 


Intuitively, the regular language C{x) describes the patterns that output neuron 
X reacts to. So, the meaning of neuron x is the recognition of language C{x). We 
use the term monotone to indicate that C{x) is recognized within surrounding 
superfluous activations of input neurons, through the notion of embedding. The 
restriction to founded regular languages expresses that outputs do not emerge 
spontaneously, i.e, the activations of output neurons are given the opportunity to 
witness at least one activation of an input neuron. 


Remark 3.2. Let M be an automaton that recognizes a founded regular language 
over X’(X). When reading the symbol 0 from the start state of M, we may only 
enter states from which it is impossible to reach an accepting state; otherwise the 
recognized language is not founded. See also Lemma 4.4 in Section 4.2 □ 


Remark 3.3. The definition of monotone-regular behaviors fuses the separate 
notions of monotonicity and (founded) regular languages. It also seems possible to 
define monotone-regular behaviors as those behaviors that are both monotone and 
regular. However, in the formalization of regular behaviors, the regular language 
of each output neuron x likely has to describe the entire input strings upon which 
X is activated (at the end). This is in contrast to the current formalization of 
monotone-regular behaviors, where the (founded) regular language C{x) could be 
very small, describing only the patterns that x is really trying to recognize, even 
when those patterns are embedded in larger inputs. The current formalization 
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is therefore more insightful for our construction in the expressivity lower bound 


(Theorem 4.5), where we convert an automaton for C{x) to a neural network that 
serves as a pattern recognizer for output neuron x. The current formalization 
of monotone-regular behaviors allows the pattern recognizer to be as small as 
possible. □ 


3.4 Positive Neural Networks 

We define a neural network model that is related to previous discrete time mod¬ 


els (Sima and Wiedermann 


1998 


Sima and Orponen, 2003), but with the following 


differences: we have no inhibition, and we consider multiple input neurons that 
are allowed to be concurrently active. 

Formally, a (positive) neural network A/" is a tuple (X, O, A, W), where 

• X, (9, and A are finite and pairwise disjoint sets, containing respectively the 
input neurons, the output neurons, and the auxiliary neurons]^ 

• we let 

edges{Af) =(X x C>) U (X x U (.4 x C>) 

U {(x, ?/) e Ax A \ X ^ y} 

be the set of possible connections; and, 

• the function W is the weight function that maps each {x,y) G edges{J\f) to 
a value in [0,1]. 

Note that there are direct connections from the input neurons to the output neu¬ 
rons. The weight 0 is used for representing missing connections. Intuitively, the 
role of the auxiliary neurons is to provide working memory while processing input 
strings. For example, the activation of an auxiliary neuron could mean that a 
certain pattern was detected in the input string. Auxiliary neurons can recognize 
increasingly longer patterns by activating each other (Elman, 1990| Kappel et al.| 
2014). We refer to Section]^ for constructions involving auxiliary neurons. 

We introduce some notations for convenience. If Af is understood from the 
context, for each x G X U C> U .4, we abbreviate 


pre{x) = {y & A \ (y,x) G edges{J\f) and W(?/,x) > 0} 

and 

post{x) = {y & Old A \ (x, I/) G edges{Af) and W(x, y) > 0}. 

We call pre{x) the set of presynaptic neurons of x, and post[x) the set of postsy- 
naptic neurons of x. 

3.1 Operational Semantics 

Let Af = (X, (9, A, W) be a neural network. We formalize how Af processes an 
input string a over V{X). We start with the intuition. 


®Auxiliary neurons are also sometimes called hidden neurons (Sima and Orponen, 2003). 
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Intuition We do |a| steps, called transitions, to process all symbols of a. At 
each time i G {1,..., |q!|}, also referred to as transition i, we show the inpnt 
symbol Oj to J\f. Specihcally, an inpnt nenron x G X is active at time i if x G 
Inpnt symbols conld activate anxiliary and ontpnt nenrons. Anxiliary nenrons 
conld in tnrn also activate other anxiliary nenrons and ontpnt nenrons. Each 
time a nenron of X U ^ becomes active, it conceptnally emits a signal. The signal 
emitted by a nenron x at time i travels to all postsynaptic nenrons y of x, and 
snch received signals are processed by y at the next time i + 1. Each signal that is 
emitted by x and received by a postsynaptic nenron y has an associated weight, 
namely, the weight on the connection from x to y. Snbseqnently, a postsynaptic 
nenron y emits a (next) signal if the snm of all received signal weights is larger 
than or eqnal to a bring threshold. The bring threshold in onr model is 1 for all 
nenrons. All received signals are immediately discarded when proceeding to the 
next time. In the formalization below, the conceptnal signals are not explicitly 
represented, and instead the transitions directly npdate sets of activated nenrons. 

Transitions A transition of Af is a triple {Ni,S,Nj) where Ni ^ O U A and 
Nj O U A are two sets of activated nenrons, S G V{X) is an inpnt symbol, and 
where 

N^ = {yeOuA\ Y. 

z&pre{y)r\(NiVJS) 

We call Ni the source set, Nj the target set, and S the symbol that is readj^ 

Run The run IZ of J\f on input a is the nniqne sequence of |a| transitions for 
which 


• the transition with ordinal i G {1,..., |a|} reads inpnt symbol ap, 

• the source set of the brst transition is 0; 

• the target set of each transition is the source set of the next transition. 

Note that TZ debnes |a| + 1 sets of activated neurons, including the brst source 
set. We debne the output of AT on a, denoted A/’(a), as the set A^ fl where N 
is the target set of the last transition in the run of Af on a. 

It is possible to consider the behavior B defined by Af: for each nonempty 
input string a, we debne B{a) = Af{a). So, like a behavior, a neural network 
implicitly transforms an input string a = over V{X) to an output 

string /? = (/3i,..., I3n+i) over X’(C>); 

• /?! = 0, and 

• fdi = Af{a^i-i) for each i G {2,..., n + 1}. 

®We include output neurons in transitions only for technical convenience. It is indeed not 
essential to include output neurons in the source and target sets, because output neurons have 
no postsynaptic neurons and their activation can be uniquely deduced from the activations of 
auxiliary neurons and input neurons. 
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3.2 Design Choices 


We discuss the design choices of the formalization of positive neural networks. 
Although the model is simple, we have some preferences in how to formalize it. 

First, the reason for not having connections from output neurons to auxiliary 
neurons is for simplicity, and so that proofs can more cleanly separate the roles of 
neurons. However, connections from output neurons to auxiliary neurons can be 
simulated in the current model by duplicating each output neuron as an auxiliary 
neuron, including its presynaptic weights. 

We exclude self-connections on neurons, i.e., connections from a neuron to it¬ 
self, because such connections might be less common in biological neural networks. 

The connection weights are restricted to the interval [0,1] to express that there 
is a maximal strength by which any two neurons can be connected. In biological 
neural networks, the weight contributed by a single connection, which abstracts 


a set of synapses, is usually much smaller than the firing threshold (Gerstner 


m 


et ah, 2014). For technical simplicity (cf. Section]^, however, the weights 
our model are relatively large compared to the firing thresholdIntuitively, such 
larger weights represent a hidden assembly of multiple neurons that become active 


concurrently, causing the resulting sum of emitted weights to be large (Maass 


1996). 


We use a normalized firing threshold of 1 for simplicity. Another choice of pos¬ 
itive bring threshold could in principle be compensated for by allowing connection 
weights larger than 1. 


3.5 Implementing Behaviors, with Delay 

Let A/" = (X, (9, W) be a neural network. We say that a behavior B is compat¬ 
ible with is over input set X and output set O. _ 


Delay is a standard notion in the expressivity study of neural networks (Sima 


and Wiedermann, 

1998; 

Sima and Orponen 

2003 


compatible behavior B with delay fc G N when for each input string a over X’(X), 


• if |a| < A; then Af{a) = 0p] and, 

• if |a| > k then J\f{a) = B{a^rn) where m = \a\ — k. 


Intuitively, delay is the amount of additional time steps that Af needs before it can 
conform to the behavior. This additional time is provided by reading more input 
symbols]^ Note that a zero delay implementation corresponds to Af{a) = B (a) 
for all input strings a. 

^°The largest weight is 1, which is equal to the firing threshold; so, a neuron could in principle 
become activated when only one of its presynaptic neurons is active. 

k = 0 then this condition is immediately true because we consider no input strings with 
length zero. 

^^Suppose Af implements B with delay k. Let a be an input string with |q;| > k. If we 
consider a as the entire input to the network Af, then the last k input symbols of a may be 
arbitrary; those symbols only provide additional time steps for Af to compute B{a^m) where 
m = \a\ — k. 
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Letting B be the behavior dehned by Af, note that Af implements B with 
zero delay. 


Remark 3.4. Sima and Wiedermann (1998) show that a neural network recogniz¬ 


ing a regular language with delay k over a single input neuron can be transformed 
into a (larger) neural network that recognizes the same language with delay 1. 
An assumption in the construction, is that the delay k in the original network is 
caused by paths of length k from the input neuron to output neurons. 

The dehnition of delay in this article is purely semantical: we only look at the 
timing of output neurons. There could be delay on output neurons, even though 
there might be direct connections from input neurons to output neurons, because 
output neurons might cooperate with auxiliary neurons (which might introduce 
delays). 

For completeness, we note that our construction in the expressivity lower bound 


(Theorem 4.5) does not create direct connections from input neurons to output 


neurons, and thereby incurs a delay of a least one time unit; but we show that it is 
actually a delay of precisely o ne time unit. This construction therefore resembles 
the syntactical assumption by Sima and Wiedermann (1998). □ 


4 Expressivity Results 


Our goal is to better understand what positive neural networks can do. Within 
the discrete-time framework of monotone-regular behaviors, we propose an upper 


bound on expressivity in Section 4.1; a lower bound on expressivity in Section 4.2 


and, in Section |4.3| , examples showing that these bounds do not coincide. This 
separation arises because our analysis takes into account the delay by which a 
neural network implements a monotone-regular behavior. It turns out that an 
implementation of zero delay exists for some monotone-regular behaviors, but not 
for other monotone-regular behaviors. A delay of one time unit is sufficient for 
implementing all monotone-regular behaviors. As an additional result, we present 
in Section [T4| a large class of monotone-regular behaviors that can be implemented 
with zero delay. If we would ignore delay, however, our upper and lower bound 
results (Sections 4.1 and |4.2| respectively) intuitively say that the class of positive 
neural networks captures the class of monotone-regular behaviors: the behavior 
dehned by a positive neural network is monotone-regular, and each monotone- 
regular behavior can be implemented by a positive neural network. 


4.1 Upper Bound 

Our expressivity upper bound says that only monotone-regular behaviors can be 


expressed by positive neural networks. This result is in line with the result by Sima 


and Wiedermann (1998), with the difference that we now work with multiple input 


neurons and the notion of monotonicity. 


Theorem 4.1. The behaviors dehned by positive neural networks are monotone- 
regular. 
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Proof. Intuitively, because a positive neural network only has a finite number of 
subsets of auxiliary neurons to form its memory, the network behaves like a finite 
automaton. Hence, as is well-known, the performed computation can be described 
by a regular language (Sima and Wiedermann, 1998). An interesting novel aspect. 


however, is monotonicity, meaning that output neurons recognize patterns even 
when those patterns are embedded into larger inputs. 

Let A/" = (X, O, A, W) be a positive neural network. Let B denote the behavior 
defined by Af. We show that B is monotone-regular. Fix some x E O. We define 
a founded regular language P(x) such that for each input string a over 'P(X) we 
have 

X G B(a) -v^ a embeds a string f3 G C{x). 

We first define a deterministic automaton M. Let g® and be two state symbols 
where g® ^ g^ and {g®,g'^} H V{0 U Xl) = 0 . We call g^ the halt state because 
no useful processing will be performed anymore when M gets into state g'^ (see 
below). We concretely define M = {Q, S, 6, q^, F), where 

• Q = {qfq^}UV{OUA); 


E=X(X); 


• regarding S, for each (g. S') G Q x S, 

- if g = g® and S' = 0 then 5(g, S) = {g*^}; 

— if g = g® and S ^ tj) then 5(g, S') = {g'} where 

q' = {yeOUA\ Y. >V(A2/)>1}; 

z&pre{y)r\S 


— if g = g^ then S{q, S') = {g*''}; 

- if g G V{0 U A) then (5(g, S) = {g'} where 


q' = {yeOuA\ Y y^{z,y)>l}] 

z&pre{y)f\{qVjS) 


• F = {g G V{0 U Xl) I X G g}. 

The addition of state g^ is to obtain a founded regular language: strings accepted 
by M start with a nonempty input symbol. We define C{x) as the founded regular 
language recognized by Mj^ 

Next, let a = {Si ,..., Sn) be a string over V{X). We show that 

X G B{a) -v^ a embeds a string jS G C{x). 

'^^The construction in this proof does not necessarily result in the smallest founded regular 
language C{x). The activation of x is based on seeing patterns embedded in a suffix of the 
input, but our construction also includes strings in C{x) that are extensions of such patterns 
with arbitrary prefixes (starting with a nonempty input symbol). 
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Direction 1 Suppose x G B[q). Because no neurons are activated on empty 
symbols, we can consider the smallest index k G with Sk ^ 0- Let 

/S = {Sk, ■ ■ ■, Sn)- Clearly a embeds f3. Note that J\f{(3) = J^{(y.), implying 
X G A/’(/9). When giving (3 as input to automaton M, we do not enter state 
since {3 starts with a nonempty input symbol. Subsequently, M faithfully simulates 
the activated neurons of J\f. The last state g of M reached in this way, corresponds 
to the last set of activated neurons of AT on (3. Since x G A/’(/d), we have q E F, 
causing [3 G C{x), as desired. 

Direction 2 Suppose a embeds a string (3 G C{x). Because {3 G C{x), there is 
an accepting run of M on (3, where the last state is an element q G V{OV3A) with 
a; G g. Since M faithfully simulates A/", we have x G A/’(/3). Because the connection 
weights of Af are nonnegative, if we would extend f3 with more activations of input 
neurons both before and during f3, like a does, then at least the neurons would be 
activated that were activated on just j3. Hence, x G Af{a), as desired. 


Remark We did not dehne Q = OU A because, when reading an input symbol, 
the activation of a neuron depends in general on multiple presynaptic auxiliary 
neurons. That context information might be lost when directly casting neurons as 
automaton states, because an automaton state is already reached by combining 
just one predecessor state with a new input symbol. □ 


The following example demonstrates that an implementation with zero delay is 
at least achievable for some simple monotone-regular behaviors. In Section |4.4 


we 


will also see more advanced monotone-regular behaviors that can be implemented 
with zero delay. 


Example 4.2. Let R be a monotone-regular behavior over an input set I and an 
output set O with the following assumption: for each x E O, the founded regular 
language C{x) contains just one string. The intuition for B, is that a simple chain 
of auxiliary neurons suffices to recognize increasingly larger prehxes of the single 
string, and the output neuron listens to the last auxiliary neuron and the last 
input symbol. There is no delay. 

We now dehne a positive neural network Af = (X, O, A, W) to implement B 
with zero delay. For simplicity we assume \0\ = 1, and we denote O = {x}; 
we can repeat the construction below in case of multiple output neurons, and 
the partial results thus obtained can be placed into one network. Denote C{x) = 
{(Fi,..., Sn)}, where Si ^ If n = 1 then we dehne A = ^ and, letting m = |S'i|, 
we dehne W{u,x) = 1/m for each u E Si, all other weights are set to zero. We 
can observe that Af{a) = B (a) for each input string a over V{X). 

Now assume n > 2. We dehne A to consist of the pairwise diherent neurons 
yi,... ,yn-i, with the assumption x ^ A. Intuitively, neuron yi should detect 
symbol Si. Next, for each i G {2,..., n — 1}, neuron y^ is responsible for detecting 
symbol Si when the prehx {Si, ..., Si-i) is already recognized; this is accomplished 
by letting y^ also listen to yi-i. We specify weight function W as follows, where 
any unspecihed weights are assumed to be zero: 
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• For neuron yi, letting m = IFil, we define >V(m, yi) = 1/m for each u G Si; 

• For neuron yi with i G {2,..., n — 1}, letting m = [Fil + 1, we dehne 
>V(m, yi) = 1/m for each u G {yi-i} U Si] 

• For neuron x, letting m = |S'„| + 1, we dehne W’(m,x) = 1/m for each 
U G /tjri—l/ U Sn- 

Also for the case n > 2, we can observe that Af{a) = B (a) for each input string 
a over 'P(X). □ 


4.2 Lower Bound 


The expressivity lower bound (Theorem 4.5 below) complements the expressivity 
upper bound (Theorem 4.1). We hrst introduce some additional terminology and 
dehnitions. 


4.1 Clean Automata 

The construction in the expressivity lower bound is based on translating automata 
to neural networks. The Lemmas below allow us to make certain technical assump¬ 
tions on these automata, making the translation to neural networks more natural. 

We say that an automaton M = (Q, S, 5, g®, F) contains a self-loop if there is 
a pair (g, S') G Q x S such that g G 6{q, S). The following Lemma tells us that 
self-loops can be removed: 

Lemma 4.3. Every regular language recognized by an automaton Mi is also 
recognized by an automaton M 2 that (i) contains no self-loops, and (ii) uses at 
most double the number of states of Mi 0 

Proof. Denote Mi = (Qi, Si, (5i, g®, Fi). The idea is to duplicate each state in¬ 
volved in a self-loop, so that looping over the same symbol is still possible but 
now uses two states. Let V be the set of all states of Mi involved in a self-loop: 

E = {g G Qi I 35 G Si with g G 5i(g, S')}. 

Let / be an injective function that maps each g G E to a new state /(g) outside 
Qi- To construct M 2 , we use the state set Qi U {/(g) | g G E}; the same start 
state as Mp and, the accepting state set Fi U {/(g) | g G FiPV}. For the new 
transition function, each pair (g, S) G Qi x Si with g G Si{q,S) is mapped to 


An odd number of repetitions over symbol S is possible because we have copied all 
outgoing transitions of g to /(g). All other pairs (g, S) G Qi x Si with g ^ Si{q, S) 
are mapped as before. □ 

'^^Intuitively, the quantification of the number of states indicates that in general M 2 preserves 
the nondeterminism of Mi. 

'®If q € (5i(g, S') then we can go back from the new state f{q) to the old state q by reading 
symbol S'. 


{/(g)} U (5i(g, S) \ {g}), and (/(g). S') is mapped to Si{q, S') for each S' G Si 
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For founded regular languages, Lemma 4.4 (below), tells us that the symbol 
0 does not have to be read from the start state. Intuitively, this last assumption 
means that activated states of an automaton can be simulated by neurons: the 
activations of input neurons in the hrst input symbol can be propagated through 
the neural network to keep track of any further progress, even if subsequent input 
symbols are empty. 


Lemma 4.4. Letting X be an input set, every founded regular language over 
X(X) recognized by an automaton Mi is also recognized by an automaton M 2 = 
(Q 2 , S 2 , ^ 2 , 5’!) F 2 ) where (i) S 2 {ql, 0) = 0, and (ii) M 2 has the same states as Mi. 

Proof. The automaton M 2 is almost exactly the same as Mi, except that the 
state-symbol combination (gl, 0) is mapped by the transition function to 0, i.e., it 
is impossible to read the empty symbol from the start state. We can immediately 
see that all accepting runs of M 2 are also accepting runs of Mi because Mi includes 
all transition possibilities of M 2 . 

For the other direction, towards a contradiction, suppose there is an accepting 
run gi,..., qn+i of Mi on a string a = (Fi,..., S'„) but this run is not an accepting 
run of M 2 . Because in M 2 we have only removed the option to read symbol 0 from 
the start state, there has to be some i G {1,... ,?7,} with S'* = 0 and g* is the 
start state (of Mi, and M 2 ). Now, note that the state sequence g*,..., gn+i is an 
accepting run of Mi on the suffix (3 = {Si,Sn). But since S'* = 0, automaton 
Ml would not recognize a founded regular language, which is a contradiction. □ 

Let M be as above. A state g G Q is said to be reachable if there is string 
a over S and a run of M on a in which g appears; this run does not have to be 
accepting. Clearly, every regular language recognized by an automaton Mi is also 
recognized by an automaton M 2 that keeps only the reachable states of Mi. 

Letting X be an input set, and letting M be an automaton that recognizes a 
founded regular language over V{X), we call M clean if 

• M contains no self-loops; 

• M does not read symbol 0 from its start state; and. 


M contains only reachable states. 


By applying Lemmas |4.3| and |4.4| in order, any automaton recognizing a founded 
regular language can be converted to a clean one that recognizes the same lan¬ 
guage; and, the number of states is at most doubled compared to the original 
automaton (through Lemma 4.3). 

For a clean automaton M = (Q, S, 5, g®, X), we define the pair set of M, 
denoted p{M), as the following set 


{(g. S') G Q X S I g 7 ^ g® and 3g' G Q with g G 5{q , S')}. 


In words: the pair set contains the combinations in M of a non-start state and an 
incoming symbol to that state. 
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Now, let -B be a monotone-regular behavior over an input set X and an output 
set O. An automaton implementation for B is a function A4 mapping each x E O 
to a clean automaton A4(x) that recognizes a founded regular language C{x) over 
X’(X) such that for each input string a over X(X), 

X G B(a) -v^ a embeds a string f3 G C{x). 

Intuitively, an automaton implementation for B is a prototype implementation 
that can later be converted to a neural network. The total pair count of Al, 
denoted c(At), is dehned as 


c{M) = ■ 

xeo 


4.2 Lower Bound Result 

Theorem 4.5. Every monotone-regular behavior B can be implemented by a 
positive neural network with delay 1. In particular, each automaton implementa¬ 
tion A4 for B can be converted to a positive neural network that implements B 
with delay 1 and that has c(A4) auxiliary neurons. 


Proof. Let B be a monotone-regular behavior over an input set X and an output 
set O. Let A4 be an automaton implementation for B. For each output neuron 
X, we translate automaton A4(x) to a neural network. Roughly speaking, we 
translate state-symbol pairs of the automaton to neurons. A novel aspect, is 
that each input symbol in our model consists of multiple input neurons. For 
this reason, our simulation of an automaton state by a neuron uses a nontrivial 
dehnition of presynaptic weights allowing us to simultaneously express (i) sn “or” 
over auxiliary neurons that provide working memory, and (ii) an “and” over all 
input neurons mentioned in an input symbol. We use only rational weights. There 
is a delay of one time unit in the construction because the output neuron x listens 
to neurons that simulate accept states of A4(x)|^ See also the later Remark 4 


The construction below is illustrated in Example 4.7 


For simplicity, we assume \0\ = 1, and we denote O = {x}; for the case of 
multiple output neurons, the construction given below can be repeated, and the 
neural networks thus obtained can be united to form the overall desired network. 
Let M = A4(x) and denote M = (Q, S, 5, F) where S = V{X). Recall that M 
is clean. 


Positive neural network We now incrementally dehne the desired positive 
neural network Af = (X, O, A, W) to implement B with delay 1. 


'^®Intuitively, the number of auxiliary neurons indicates that in general the constructed neural 
network preserves the nondeterminism, and thus the parallelism, of the automata in A4. 

^^An automaton itself does not introduce delay on string acceptance. In the construction of a 
neural network, however, all the different accept states should essentially be tunneled through a 


single output neuron. This requires in general a delay of one time unit (cf. Section 4.3). 
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Auxiliary neurons First, we define the set of auxiliary neurons: 


A = p(M), 

where p{M) is the pair set of M as dehned above. Intuitively, an auxiliary neuron 
(g. S'), where always q ^ g®, represents the automaton state g reached by reading 
input symbol S from some previous state. We dehne the set T C A of trigger 
neurons: 

r={(g,S)eA|ge5(g^S)}. 

Intuitively, the neurons in T are the hrst auxiliary neurons that become activated 
by the input; these neurons simulate the event of reading an input symbol from 
the start state of automaton M. Note that for each (g, S) E T we have S ^ tj) 
because S{q^,(l)) = 0 by assumption on M. 

For each (g, S) G A\T, we dehne the set con{q, S) of context neurons of (g, S) 
as follows: 

con{q, S) = {(g'. S') EA\qE 6{q', ^)}. 

Intuitively, con{q, S) is the set of auxiliary neurons that recognize prehxes of the 
strings that neuron (g, S) should recognize, i.e., con{q, S) is the working memory 
from the viewpoint of (g, S). In the dehnition of con{q, S), there is no relationship 
between the symbols S and S'. Note that for each (g, S) G A\T, the set con{q, S) 
is always nonempty because M contains only reachable states]^ 

Weights The design of the connection weights is an intricate part of the 
construction. For this reason, we spend sufficient attention to the underlying 
design process. Suppose we have an auxiliary neuron y = (g, S) that should 
listen to a context con{q, S) = {zi ,..., Zm} of auxiliary neurons and to the input 
symbol S = We desire weights Wi and W 2 , where Wi is assigned 

to each connection {zi,y) with i G {!,... ,m} and W 2 to each connection {uj,y) 
with j G {!,... ,n}, such that the following three properties are satished: (i) y 
is not activated if all of {zi ,..., Zm} are activated but not yet all of {ui, 

(ii) y is already activated if at least one z G {zi, ..., Zm} is activated while all of 
{«!,..., Un} are activated; and, (Hi) y is not activated if only all of {ui,..., «„} 
are activated. This assignment of weights corresponds to the earlier announced 
“or” and “and”, over {zi, ..., Zm} and {ui,..., Un} respectively. 

The above desired properties (i), (ii), and (Hi) are satished by the following 
weight functions Wi and W2 that are parameterized by the set cardinalities m and 
n, denoting Nq = M \ {0 }, 

tci : No X No —)■ [0,1] : Wi{m,n) = l/{n ■ m + l), 

^2 : No X No —>■ [0,1] : W 2 {m,n) = m/{n -m + l). 

The design of these functions is documented in Appendix The satisfaction of 
the desired properties is now formalized by the following observations: 

^®Indeed, since (q, S) € A, there is a reachable state q' £ Q with q G 6{q', S). But {q, S) ^ T 
implies q' 7 ^ g®, causing (q', S') G A for some S' G F(I). Hence, (g', S') G con{q, S). 
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Claim 4.6. Letting m,n ^ No, 

• m ■ wi{m, n) + (n — 1 ) ■ W 2 {ni, n) < 1 ; 

• Wi{m, n) + n ■ W 2 {m, n) > 1; 

• n ■ W 2 {m, n) < 1 . 

Next, we can define the weights for all connections. We define the weight 
function W from the perspective of the neurons in AU{x}, where any unmentioned 
weights are assumed to be zero: 

• for the output neuron x, and each {q, S') G .4, where q is an accepting state 
of M (i.e., q E F), we dehne 


W{{q,S),x) = l- 

• for each {q,S) G T and each y E S, letting n = [S'!, we dehne 

{q,S)) = 1 /n; 

• for each (g. S') E A\T with S' = 0, for each y E con{q, S'), we dehne 

W{y,{q,S)) = l; 

• for each (g. S') E A\T with S' 7 ^ 0, letting m = |con(g. S')! and n = [S'!, for 
each y E con{q, S'), we dehne 

{q, S)) = wi{m,n), 

and for each z E S, we dehne 

W(^, (g. S')) = W 2 {m,n); 
note in this case that m > 0 and n > 0 . 

Intuitively, the role of neurons (g. S') E A\T with S' = 0 is to propagate past 
memories forward in time, without requiring new activations of any input neurons. 

Correctness We show that A/" implements B with a delay of one time unit. 
Let a = (tti,... ,an) be an input string over V{I). If n = 1 then Af{a) = 0, 
as desired, because the output neuron x only listens to auxiliary neurons (that 
represent accepting states), which makes it impossible for x to become activated 
on a string with just one symbol. Henceforth, suppose n > 2. We show that 
Af{a) = 
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Direction 1 Suppose x G J\f{a). The activation of x means that there is a 
maximal chain of auxiliary neurons 




that becomes activated when showing a to J\f (with k > 1), where {qi.,Si) is a 
trigger neuron; (g 2 , *S' 2 ), ..., (g/c, Sk) are non-trigger auxiliary neurons; (gi_i, Si_i) 
is a presynaptic neuron of {qi, Si) for each z e {2 ,..., k}; and, (g^, Sk) has activated 
X while the last input symbol was shown. Let (3 = {Si,..., Sk). By design 
of the presynaptic weights of the auxiliary neurons (cf. Claim [Tb ), we know that 
the symbols Si, ..., Sk effectively occur in a, and more particularly that a^n-i 
embeds (3. Next, we show that M accepts (3. Then, since B is monotone-regular, 
the embedding of (3 into a^n-i implies x G B{a^n-i). 

Based on the above sequence of auxiliary neurons, the state sequence q^,qi,... ,qk 
forms an accepting run of M on (3: 

• qi E 6{q^. Si) because {qi. Si) is a trigger neuron; 

• for each i G {2,...,k}, we have g* G 6{qi_i,Si) because (gj_i,S'i_i) is a 


presynaptic neuron of {qi,Si) 
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qk must be an accepting state, because we assumed that neuron (g^, Sk) has 
activated x. 


Direction 2 Suppose x G B{a^n-i). Because B is monotone-regular, 
a^n-i embeds a string (3 that is accepted by M. Denote (3 = {Si,..., Sk). We 
consider an accepting run g®, gi,..., g^ of M on (3. The string f3 can be chosen 
so that g® ^ {gi,..., g^lj^ We now consider the following sequence of auxiliary 
neurons: (gi. S'!),..., {qk, We show that this sequence of auxiliary neurons 

becomes active in the last k steps of J\f on input a^n-i. Let Ni,... ,Nn be the 
sequence of sets of activated neurons while running Af on input a^n-i, where 
Ni = 0. We show (by induction) for each i G {1,..., A:} that (g*. Si) G Nn-k+i. 
This results in {qk, Sk) G Nn, and because {qk, Sk) simulates an accepting state, 
on the full string a we thus obtain x G Af{a), as desired. 

Before we continue, note that the embedding of f3 into a^n-i concretely means 
Si C an-i-k+i for each z G {!,..., k}. For the base case, we see that (gi. S'!) is a 
trigger neuron because gi G (5(g®, Si). So, Si C an-k implies (gi. S'!) G N^-k+i. 

For the inductive step, we assume (g^-i, Si-i) G Nn-k+i-i where z G {2,..., k}. 
We show that {qi. Si) G Nn-k+i. If {qi,Si) is a trigger neuron then a similar 

^®From the definition of presynaptic neuron, we know that the connection from {qt-i, S'i-i) to 
{qi, Si) has a strictly positive weight. This weight could only have been defined if {qi-i, Si-i) G 
con{qi, Si). 

20 if qs 

= qi for some i G {!,..., fc} then qi,qi^i,... ,qk is an accepting run on the suffix 
j3' = (S'i+i, ... ,Sk), and we could instead focus on the smaller string /3' that is also embedded 
into a^n-i. 

These are valid auxiliary neurons because (i) q^ ^ {qi,... ,qk} by assumption; and, 

(ii) because g®, gi,..., qk is an accepting run, we have qi G S{q‘^, Si) and qi G 6{qi-i, Si) for 
each i G {2,... ,k}. 
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reasoning applies as in the base case, using that Si C an-i-k+i- If (?i, Si) is not a 
trigger neuron then (gj_i, Si_i) G con{qi, Si) because g* G Si) and gj_i 7 ^ g®, 

and we distinguish between the following two cases: 


• Suppose Si = 0. Then the connection weight from (gi_i, Si-i) to (g^, Si) was 
set to 1, and the activation (gj_i,S'j_i) G Nn-k+i-i implies the activation 
(gi) Si) G Nn—k+i- 


Suppose Si 7 ^ 0. In that case, the presynaptic weight design of {qi, Si) with 
functions Wi and W 2 (cf. Claim 4.6), applied to the presynaptic activations 
(gj-i,^*.!) G Nn-k+i-i and Si C Un-i-k+i = an-k+i-i, gives the activation 
( 9 *) ^i) ^ ^n—k+i- 


□ 


Example 4.7. We illustrate the construction of the proof of Theorem 4.5 Let X 


consist of four distinct input neurons a, b, c, and d. Let O = {x}. We dehne the 
following input symbols: Si = {a,b,c}, S 2 = {b,c}, and S 3 = {a,d}. 

that recognizes a 


Consider the clean automaton M depicted in Figure la 


founded regular language over 'P(X); we denote this language as £(x)|^ Language 
C{x) is inhnite because of the loop between states gi and g 2 over symbol S 2 . In 
particular, C{x) contains all strings of the form {Si, S 2 , S 3 ), where S 2 denotes an 
arbitrary number of repetitions of symbol 5 * 2 . Let B be the monotone-regular 
behavior over X and O dehned by C{x): for each input string a over V{X), 


X G B{a) -v^ a embeds a string /3 G C{x). 

Applying the transformation in the proof of Theorem |4.5| to automaton M re¬ 
sults in the positive neural network J\f depicted in Figure [Tb| where input neurons 
are indicated by boxes and the nonzero (rational) edge weights are written at the 
end of a connection. Auxiliary neuron (gi. Si) is the only trigger neuron; it listens 
for symbol Si. Note that the loop between states gi and g 2 of M is preserved as 
a loop between the auxiliary neurons (gi,S' 2 ) and (g 2 , 5 ' 2 ). We can also see, for 
example, that the neuron (gs, S 3 ) is only activated at time t G N when at time t — 1 
both input neurons a and d are active and at least one of the auxiliary neurons 
(gi,S'i), (g 2 ,S' 2 ), and (gi,S' 2 ); these auxiliary neurons may be viewed as working 
memory, representing the recognition of prehxes of the desired strings. □ 


Remark 4.8. In the proof of Theorem 4X it is possible to replace the or-and 
construction of weight functions Wi and W 2 by a two-stage process, at the cost of an 
additional delay of one time unit. If we ignore t his additional delay, the result ing 
construction is similar to the one described by Sima and Wiedermann (1998) in 


their Theorem 4.1, for the setting with one input neuron, with the difference that 
we only use positive weights and are thus expressing monotone-regular behaviors. 


In Figure la we use the standard notations (Hopcroft and Ullman 1979[ Sipser, 20061: the 


start state has an entering arrow with no source, and accepting states are indicated with double 
circles. 
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(a) A clean automaton recognizing a 
founded regular language. The input 
neurons are a, b, c, and d; the consid¬ 
ered input symbols are Si = {a,b,c}, 
S 2 = {b, c}, and S 3 = {a, d}. 


(b) The positive neural network ob¬ 
tained from the automaton in Fig- 
The boxes represent the input 


ure 


la 


neurons a, 6, c, and d. The output neu¬ 
ron is X. The remaining neurons are 
auxiliary. 


Figure 1: The automaton 


and positive neural network of Example 


4.7 
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Concretely, for each symbol S G V{Z) used by the automaton, with S' 7 ^ 0, we 
introduce a preprocessor neuron ys having the following presynaptic weight for 
each u E S, where n = |S|: 

So, neuron ys will only be activated when all neurons of S are activated. Next, 
each auxiliary neuron (g, S) E A with S 7 ^ 0 is conhgured to read the preprocessor 
neuron ys instead of the input neurons in S directly: 

• if (g, S) E T then we dehne W(?/ 5 , (g, S)) = 1; 

• if (g, S) E A\T with S = 0 then for each z G con{q,S) we dehne 
'W{z, (g, S)) = 1 as before; 

• if (g, S) G ^ \ T with S 7 ^ 0, letting m = |con(g, S)|, we dehne 

>V( 2 /s, (g, S)) =m/(m + l), 
and for each z G con{q, S), 

yV{z,{q, S)) = l/(m + l). 


The total implementation delay now becomes two time units: (i) trigger neurons 
listen to the above preprocessor neurons, and (ii) the output neurons listen to 
auxiliary neurons that simulate ac cept states as before. We sho uld point out, 
however, that the construction by Sima and Wiedermann (1998) only incurs a 
delay of one time unit because in their setting there is only one input neuron; so, 
in that setting, all the above preprocessor neurons can be conceptually merged 
into the single input neuron. □ 


4.3 Separation 

Regarding the expressivity of positive neural networks, the upper bound (Theo¬ 


rem 4.1) and the lower bound (Theorem 4.5) do not coincide. Indeed, as illustrated 
by the following two examples, there are simple monotone-regular behaviors that 
can not be implemented with zero delay. The main intuition in these examples, is 
that the fast reaction speed demanded by zero delay forces too much responsibil¬ 
ity on the output neuron, causing this neuron to be erroneously activated. Each 
example illustrates a different kind of error. 


Example 4.9. Let Si and S 2 be two disjoint sets of neurons with |S'i| >2 and 
IS '21 > 2. Let I = SiUS 2 and O = {a:}. Let C{x) be the following founded regular 
language over V{X)\ 

C{x) = {{Si),{S2)}. 

So, C{x) is a hnite language containing two one-symbol strings]^ Let B be the 
following monotone-regular behavior over X and O dehned by C{x)\ for each input 

automaton recognizing C{x) could have two accepting states qi and <72 besides the start 
state g®: reading symbol Si from g® leads to g^ for i G {1, 2}. 
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string a over 'P(X), we define 

7 -./ \ I if ^ embeds a string j3 G Cix)] 

B[a) = \ ^ 

1 0 otherwise. 

We show that there is no positive neural network that implements B with zero 
delay. Towards a contradiction, suppose there is such a neural network J\f. We 
show that the connections from Si to x and the connections from S 2 to x interfere 
with each other, causing x to also be triggered on wrong input symbols. 

Because J\f implements B with zero delay, we have J\f {a) = B (a) for all input 
strings a over V(Z). In particular, A/’((S'i)) = {x} and A/’((S' 2 )) = {x}. These 
fast output reactions imply that neuron x does not rely on auxiliary neurons, and 
instead reads input neurons directly. So, 

^ W(M,a:) > 1, and (4.1) 

mSSi 

^>V(m,x)>1. (4.2) 

nSS2 

We distinguish between the following cases: 

• Suppose there exist some y & Si and z E S2 such that 

yV{y, x) + x) > 1. 

Dehne the symbol S = {y-,z}. Note that S G ViZ). Because |S'i| > 2 and 
| 5 ' 2 | > 2, we have Si S and S 2 ^ S. Please note that by choice of y and 

^>V(M,a;) > 1. 

uGS 

So, W((S)) = {x}. But the string (S') does not embed a string from £(a;), 
giving B{{S)) = 0. Hence, A/’((S)) 7 ^ B{{S)), which is a contradiction. 


• If the hrst case does not hold, then we can choose some y E Si and z E S 2 
for which 

W{y, x) + yV{z, x) < 1 . 

Dehne the symbol S = X\{y, z}. Note that S E ViX). Because y E Si and 
2 ; G S 2 , we have Si S and S 2 ^ S. Moreover, 


>V(m, x) = >V(m, x) + W{u, x) — W{y, x) — W{z, x). 

u^S uGSi u^S2 


By using inequalities (4.1) and (4.2) from above, and W{y,x) + W{z,x) < 1, 
we can further obtain: 


^ W(m, x)>2- (W(|/, x) + Wiz, x)) 

uGS 

> 1. 

So, A/’((S')) = {x}. But the string (S') does not embed a string from £(a;), 
giving B{{S)) = 0. Again, A/’((S')) 7 ^ B{{S)), which is a contradiction. 
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□ 

Example 4.10. Let Si, S 2 , S 3 , and 5*4 be nonempty sets of nenrons that are 
pairwise disjoint. Let X = U^=i ^ Let C{x) be the following 

fonnded regnlar language over X(X)p^ 

X(a;) = {( 5 i, 52 ),(^ 3 ,^ 4 )}. 

Let B be the monotone-regular behavior over X and O defined by C{x): for each 
input string a over V{X), 

I {x} if a embeds a string j 3 G Cix)] 

B[a) = \ ^ 

1 0 otherwise. 

We show there is no positive neural network that implements B with zero delay. 
Towards a contradiction, suppose there is such a network J\f = (X, O, A, W). We 
show that J\f erroneously activates the output neuron on the input string (S'!, S'4) 
or on the input string (S'3,S'2). Intuitively, the output neuron x confuses the 
memory contexts emerging from symbols Si and S3. 

Because Af implements B with zero delay, we have Af{a) = B {a) for all input 
strings a over V{I). In particular, Af{{Si, S2)) = {x} and A/’((S'3, S'4)) = {x}. 
Let .4.1 C ^ denote the set of auxiliary neurons activated after reading the string 
(S'!). Similarly, let ^43 C .4 denote the set of auxiliary neurons activated after 
reading the string (S'3). Denote, for i G { 1 , 3 }, 

Wi = ^ W{y,x). 

Also denote, for i G ( 2 , 4 ), 

Wi = ^ yV{y,x). 

y&Si 

Now, the output activations A/’((S'i, S'2)) = (xj and A/’((S'3, S'4)) = (x) imply 

Wi+ W 2 > 1, and 

W3 + W4> 1. 

We distinguish between the following cases 

• Suppose tCi-|-W4 > 1 . This implies A/’((S'i, S'4)) = {x}. But then A/’((S'i, S4)) 7^ 
B{{Si , S'4)), which is a contradiction. 

• In the other case, we have Wi + < 1 . Together with W3 -|- tC4 > 1 from 

above, we see that W3 > Wi. Combining W3 > Wi and Wi + W2 > 1 from 
above, we obtain W3 + W2> 1 . This implies A/’((S'3, S'2)) = {x}. But then 
A/’((S'3, S'2)) 7^ B{{S 3 , S'2)), which is a contradiction. 

□ 

automaton recognizing this language could splits its computation into two branches from 
the start state: one branch recognizes the string (Si, S'2) and the other branch recognizes the 
string (53,^4). 

Although JV{{Si, S2)) = {s} and A/’((S2)) = B{{S2)) = 0 imply that wi > 0 , the proof 
does not really use this fact. Similarly, W3 > 0 , but the proof does not use this fact. 
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4.4 On Zero Delay 


The earlier Example |4.2 has provided a zero delay implementation for monotone- 
regnlar behaviors whose nnderlying fonnded regnlar langnage contains only one 
string. Here, we present a larger class of monotone-regnlar behaviors that can be 
implemented with zero delay. First, we call a regnlar language C converging if all 
strings in C end with the same symbol. The following result demonstrates that 
even monotone-regular behaviors whose underlying founded regular languages are 
inhnite can sometimes be implemented with zero delay: 

Theorem 4.11. Every monotone-regular behavior where the founded regular lan¬ 
guage of each output neuron is also converging, can be implemented by a positive 
neural network with zero delay. 

Proof. Let S be a monotone-regular behavior over an input set I and an output 
set O where the founded regular language of each output neuron is also converging. 


Let A4 be an automaton implementation for B. As in the proof of Theorem 4A 
we fix some x & O. Let C{x) be the language recognized by A4(x). Denote 
M {x) = (Q, E, 6, q^,F), where E = V{X). We can modify the construction in the 


proof of Theorem 4.5 as follows 


First, we dehne the set V of all state-symbol combinations that lead to an 
accepting state: 

V = {(g,S)EQxP(I)ld(g,S)nF^0}. 

Because F(x) is converging, there is one symbol S E V{X) such that S = Si for 
each {qi, Si) G We refer to S as the terminal symbol. The only difference 


compared to the proof of Theorem 4.5, is that we now let output neuron x listen 
to (i) the symbol S directly and (ii) a. different set C of auxiliary neurons. Letting 


A be the set of auxiliary neurons as defined in the proof of Theorem 4A, we define 

C' = {(g,5')e.4|(g,5)eE}. 


We now specify the presynaptic weights for x, depending on symbol S: 

• Suppose S' = 0. We still have C ^ 0\ there is always a string a E C{x) 
ending with S', for which there is an accepting run gi, ..., g„+i where 
qn+i E 6{qn, S) n F] and, 7^ g® because A4(x) does not read S' = 0 from 
its start state, implying (g„, S") G C for some S' E V{X). Now, for each 
y E C, we define 

W{y,x) = 1. 

• Suppose S' 7 ^ 0. If C = 0 then x only has to detect symbol S'; accordingly, 
letting n = [S'!, for each z E S, we dehne 

x) = 1 /n. 

^®For each Si) S V, there is an input string a over P(T) and a run of A 4 (x) on a ending 
with Qi because qi is a reachable state by assumption on Ai(x}. Since {qi, Si) G V, the extension 
of a with Si belongs to L{x). So, for any {qi,Si) G V and (92, S2) G V, there are strings in C{x) 
ending with and S2; but convergence of C{x) implies Si = S2. 
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If C 7 ^ 0, then we reuse the or-and construction with weight functions wi 
and W 2 ] concretely, letting m = \C\ and n = [S'!, for y E C, we dehne 

yV{y, x) = n), 

and for each z E S, we dehne 

W{z,x) = W2{m,n). 


All other connections from auxiliary neurons to x are set to zero. So, instead of 
listening to auxiliary neurons that simulate accept states, the output neuron x (i) 
listens to auxiliary neurons that simulate the states preceding accept states, and 
(ii) also verihes that the terminal symbol S effectively occurs. □ 


The following example demonstrates that the converse of Theorem 4.11 does 
not hold, so we do not yet have a precise characterization of the monotone-regular 
behaviors that can be implemented with zero delay. 


Example 4.12. Let Si = {a,b} and S 2 = {b,c} where a, b, and c are pairwise 
different neurons. Let X = Si U S 2 and O = {a:}. Let C{x) be the following 
founded regular language over V{X)\ 


C{x) = {{Si),{S2)}. 


Note that C{x) is not converging. Let B be the monotone-regular behavior over 
X and O dehned by C{x): for each input string a over V(X), 


I {x} if a embeds a string /3 E C(x); 

B[a) = \ ^ 

1 0 otherwise. 

The following positive neural network Af = (X, O, A, W) implements B with 
zero delay: A = ^, and 


>V(a, x) = 1/3, 
W{b,x) = 2/3, 
W(c,x) = 1/3. 


In contrast to Example |4.9[ we can not fool this network to trigger x on a wrong 
input symbol like {a, c}. That is because W assigns a heavier weight to connection 
(6, x), which renders the input neuron b crucial for the activation of x. □ 


5 Conclusion and Future Work 

We have studied the expressivity of positive neural networks with multiple input 
neurons. Within the framework of monotone-regular behaviors, we have suggested 
both an upper and lower bound on the expressivity. These bounds do not coincide 
when we take into account the delay by which a behavior is implemented. We now 
discuss several avenues for further work. 
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Single input neurons If there is only a single input neuron, Sima and Wie- 


dermann (1998) show that all regular languages can be recognized by a neural 
network with a delay of one time unit. Our article has shown a similar result for 
monotone-regular behaviors, but in the case of multiple input neurons. It might 
be interesting to better understand the relationship between these results. 

Symbols over multiple input neurons could be translated to a single input 
neuron as follows: supposing there are n ordered input neurons, each subset of 
input neurons can be represented as a binary code over n bits. This way, each 
sequence of input symbols can be translated to a sequence of binary codes, and the 
resulting sequence may be viewed as a single bit string. However, this construction 
would increase output delay. Moreover, it is not clear if this technical construction 
can be achieved inside a positive neural network itself, because on every time step 
an entirely new symbol arrives over the multiple input neurons; the positive neural 
network might not be able to buffer the new symbols while it is translating the 
previous symbols. 


Characterizing zero delay We have seen that seemingly simple monotone- 
regular behaviors already require a delay of one time unit (Section |4.3[ ). We have 
also made some first steps towards identifying the class of monotone-regular be¬ 
haviors that can be implemented with zero delay (Section |4.4[ ). However, a precise 
characterization is missing. Example 4.12 suggests that in case of multiple termi¬ 
nal symbols in the underlying regular languages, we could seek for an assignment of 
nonuniform weights to the input neurons. Perhaps the existence of such nonuni¬ 
form weights can be related to the syntactical properties of the accompanying 
automata. 


Minimal network size Like pre vious complexity-theoretic analyses of neural 


networks (Sima and Orponen, 2003), one could examine what minimal number of 


auxiliary neurons is necessary for implementing certain monotone-regular behav¬ 
iors. Note that a lower bound on the number of states in an automaton imple¬ 
mentation of a behavior does not directly provide a lower bound on the number of 
neurons, because clever design of the weights could perhaps pack more function¬ 
ality into fewer neurons than the number of automaton states (or symbol-state 
combinations). Such efficient implementations were previously studied, e.g. by 


Horne and Hush (1996) for the simulation of deterministic automata by recurrent 


neural networks. For positive neural networks, it might be possible to explore the 


relationship with (monotone) AND-OR boolean circuits, where Alon and Boppana 


(1987) have previously obtained lower bounds on the number of gates (neurons) 


for implementing certain boolean functions. 


We should note, however, that some of the existing constructions, e.g. (Horne 


and Hush, 1996) introduce delays in which the overall neural network would pro¬ 
cess incoming input symbols. To compare such constructions with the results 
regarding delay in this article, perhaps some of the constructed sub-circuits could 
be viewed as being computed instantaneously, and would thus not contribute to 
the overall delay. 
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Inhibition Previous works on the expressive power of neural networks have of¬ 
ten assumed negative connection weights between neurons, allowing neuron s to 
inhibit the activation of their postsynaptic neurons (Sima and Orponen, 2003). It 


is interesting to extend our work with this feature, but in such a way that it is still 
biologically plausible. In particular, one should make a distinction between exci¬ 


tatory and inhibitory neurons (Gerstner et al., 2014): the postsynaptic weights of 


excitatory neurons are always positive and the postsynaptic weights of inhibitory 
neurons are always negative. Both neuron types are used in winner-take-all cir¬ 
cuits (Kappel et ah, 2014| ). 


As suggested by the hndings of Sima and Wiedermann (1998), inhibitory neu¬ 


rons could allow the neural network to test for the explicit absence of input ac¬ 
tivations, lifting the expressive power to “regular” behaviors that, in contrast to 
monotone-regular behaviors, depend on very precise input symbols that are not 
embedded in surrounding input noise. For example, a neural network might acti¬ 
vate an output neuron whenever the input symbol {a, b, c} occurs in its pure form, 
i.e., no other input neurons are active besides a, b, and c. 

Another view, is that inhibitory neurons have a stabilizing effect, at least in 
a winner-take-all setting (Kappel et ah, 2014): inhibitory neurons let the most 


strongly recognized patterns survive; otherwise perhaps too many insignihcant 
pattern pieces will be floating around in the limited working memory. 

Possibly, multiple biologically plausible topologies with inhibition are possible. 
The expressivity of the resulting neural network models, including any results 
regarding delays, could strongly depend on the manner by which inhibitory and 
excitatory neurons are connected. 

Noise and continuous time Noise is an important aspect of real biological 


neurons (Gerstner et al., 2014), and it might be an important resource for ex¬ 


pressing nondeterministic computations (Maass, 2014). It would be interesting to 


see how the results regarding regular languages can be extended to this framework. 
One possibility is to study the quality by which a noisy positive neural network 
approximates a true monotone-regular behavior. Here, quality might be formal¬ 
ized as the probability of producing correct output activations given a certain 
probability distribution on the noise. 

Moreover, the model studied in this article is based on discrete time steps. 
Again, real-world neurons do not obey this restriction, so it appears interesting 
to investigate if our results can be extended to a setting with continuous time. 
However, the restriction to discrete time steps may enable an understanding of 
neurons that operate in continuous time by focusing on the causal relationships 
between neuron activations. From this viewpoint, regular languages could also 
provide insights into the workings of neurons operating in continuous time. 


Learning An important aspect of biological neurons is that they modify their 
presynaptic weights over time through a learning mechanism called STDP, that 
depends on the relative timing of neuron activations (Gerstner et al., 2014),^ One 


^^The acronym “STDP” stands for spike-timing-dependent plasticity. 
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could for example consider reward-modulated STDP, where connection weights are 
updated at some time point when the overall performance of the neural network 
has recently improved (Gerstner et ah, 2014). In a biologically plausible setting, 


it seems intriguing to understand how overall behavior and consciousness could 


emerge from dopamine neurons signaling reward to an organism (Schultz, 2013). 


Forbidding recurrent connections Weak recurrent connections in biological 
neural networks might already be sufficient to provide an interaction of working 


memory with new inputs (Buonomano and Maass, 2009). So, pure looping behav¬ 


ior as needed in the recognition of regular languages might not be really needed 
by an organism. So, in a further expressivity study, one could simplify positive 
neural networks by forbidding recurrent connections. This way, only hnite regu¬ 
lar languages can be recognized. It seems interesting to understand the resulting 
model from a practical perspective. In particular, one might verify if the resulting 
networks are still useful for real-world tasks. It seems that memories of larger 
stimuli require more neurons, and longer activation chains between those neurons. 

Sharing auxiliary neurons The construction for the expressivity lower bound 


(Theorem 4.5) builds a separate network of auxiliary neurons for each output 


neuron. In biological networks, multiple output neurons share a pool of auxiliary 


neurons (Buonomano and Maass, 2009). It seems interesting to understand the 


impact of sharing on the behaviors exhibited by the individual output neurons. 


Multiple interconnected networks In this article, we have investigated the 
expressiveness of single networks where all neurons are directly connected to each 
other. However, when the number of neurons increases, the number of direct 
connections increases quadratically. This would become impractical to implement 
in biological neural networks. Indeed, one hypothesis is that the brain is composed 
of many small networks that are connected strongly internally, but perhaps only 
weakly externally ( Kappel et al.| 2014). It is interesting to understand how such 
an organization of the connections influences the expressivity. 
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A Design of the Weights (Claim 4.6) 


Denote Nq = N \ {0}. Let m eNq and n G Nq. Suppose we have two sets Y and 
Z with m = \Y\ and n = \Z\. Both sets should form the presynaptic neurons of a 
neuron x. We want to hnd weights Wi and W 2 , to be assigned to the neurons in Y 
and Z respectively, such that 


1) m ■ Wi + {n — 1) ■ W 2 < V, 
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2 ) wi + n ■ W 2 > 1] 

3) n ■ W 2 <1. 

Condition expresses that all nenrons from Z should be activated before x may 
be activated, regardless of how many neurons in Y are activated. Condition 
expresses that if all neurons in Z are activated then a single neuron from Y suffices 
to activate x; but Condition stipulates that at least one neuron of Y should be 
activated. So, neuron x requires all neurons of Z and just a single neuron from 
Y. Our design of such weights is based on a denominator / G No- 


= l/f, 

W 2 = {1- l/f)/n. 

We see that Condition is satished for any / G Nq: 

l/f + n ((1 - l//)/n) = l/f + (1 - l/f) 

= 1 > 1 . 


Also, Condition 1^ is satished for any / G Nq: 

’^((1 - 1 //)/’^) = 1 - 1 // < 1 - 
For Condition [T| we solve for /: 


m ■ wi + {n — 1) ■ W 2 < 1; 

m/f + (1 - 1//) ("V") < 

/ > n{m — 1) + 1. 


So, we can choose / 


n ■ m + 1 
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^®Because n > 0 , we can make the following derivation: m — 1 < m; n{m — 1 ) < n ■ m] 
n{m — l) + l<n-m + l. 
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